week 7: multilevel models

multilevel tadpoles

multilevel models

We’re starting our unit on multilevel models, which can be thought of as models that “remember” features of clusters of data as they learn about all the clusters. The model will pool information across clusters (e.g., our estimates about cluster A will be informed in part by clusters B, C, and D). This tends to improve estimates about each cluster. Here are some other benefits of multilevel modeling:

  1. improved estimates for repeated sampling. If you try to fit a single-level model to these data, you’ll over- or under-fit the data.
  2. improved estimates for imbalance in sampling. prevent over-sampled clusters from dominating inference, while also balancing the fact that larger clusters have more information.
  3. estimates of variation. model variation explicitly!
  4. avoid averaging, retain variation. averaging manufactures false confidence (artificially inflates precision) and introduces arbitrary data transformations.

Multilevel modeling should be your default approach.

review (or introduction?)

(Ignore probability and estimation for now)

\[ Y_i = b_0 + b_1X_{1i} + ... + \epsilon_i \] This assumes each observation \((i)\) is independent of all other observations. But what if our data were clustered in some way? We might be able to express each observation \(i\) as belonging to a specific cluster, \(j\).

\[ Y_{ij} = b_0 + b_1X_{1ij} + ... + \epsilon_{ij} \]

But what is the problem with this?

(Still non-Bayesian)

Intercept-only model

\[\begin{align*} \text{Level 1}&\\ Y_{ij} &= \beta_{0j} + \epsilon_{ij}\\ \text{Level 2}&\\ \beta_{0j} &= \gamma_{00} + U_{0j} \\ U_{0j} &\sim \text{Normal}(0, \tau^2_{00}) \\ \epsilon_{ij} &\sim \text{Normal}(0, \sigma^2) \\ \end{align*}\]

We can rewrite this as the “combined” model:

\[ Y_{ij} = \gamma_{00} + U_{0j} + \epsilon_{ij}\\ \] Level 1 is where you have data that repeats within your grouping or clustering data. Is your cluster classrooms? Then students are Level 1. Is your cluster people? Then observations are Level 1.

Level 2 takes the parameters at Level 1 and decomposes them into a fixed component \((\gamma)\) that reflects the average and, if desired, the individual deviations around that fixed effect \((U)\).

\[\begin{align*} \text{Level 1}&\\ Y_{ij} &= \beta_{0j} + \epsilon_{ij}\\ \text{Level 2}&\\ \beta_{0j} &= \gamma_{00} + U_{0j} \\ U_{0j} &\sim \text{Normal}(0, \tau^2_{00}) \\ \epsilon_{ij} &\sim \text{Normal}(0, \sigma^2) \\ \end{align*}\]

example: multilevel people

data_path = "https://raw.githubusercontent.com/sjweston/uobayes/refs/heads/main/files/data/external_data/mlm.csv"
d <- read.csv(data_path)

rethinking::precis(d)
             mean          sd      5.5%      94.5%    histogram
id    119.9954338 51.04319970 47.890000 209.020000 ▁▂▂▅▇▅▃▅▂▂▂▁
group         NaN          NA        NA         NA             
week    1.3981481  1.45595690  0.000000   3.000000 ▇▃▁▂▁▅▁▁▁▁▁▁
wave    1.7899543  0.79082439  1.000000   3.000000       ▇▇▁▂▁▁
con     0.1911393  0.07427626  0.083850   0.309936   ▁▂▃▇▅▃▁▁▁▁
dan     0.1946091  0.06434335  0.096652   0.296208     ▁▁▅▇▇▃▁▁
Code
set.seed(114)
sample_id = sample(unique(d$id), replace=F, size=20)
d %>%
  filter(id %in% sample_id) %>% 
  ggplot(aes(x = week, y = con, group = id, fill = id)) + 
  geom_point(aes(color = factor(id))) + 
  stat_smooth(aes(color = factor(id)),
              method = "lm", se = FALSE) +
  theme(legend.position = "none")

What if we wanted to estimate each person’s conscientiousness score? One method would be to simply average scores for each person, but we lose a lot of information that way. Another option would be to treat each person as a group and model scores as a function of group. We can do this using an unpooled model.

\[\begin{align*} \text{con}_i &\sim \text{Normal}(\mu_i,\sigma) \\ \mu_i &= \alpha_{\text{id}[i]} \\ \alpha_j &\sim \text{Normal}(0, 1.5) \text{ for }j=1,...,91 \\ \sigma &\sim \text{Exponential}(1) \end{align*}\]

d$id.f = as.factor(d$id)

m1 <- brm(
  data=d,
  family=gaussian,
  bf(con ~ 0 + a,
     a ~ 0 + id.f,
     nl = TRUE),
  prior = c( prior(normal(0, 1.5), class=b, nlpar=a),
             prior(exponential(1), class=sigma)),
  iter=2000, warmup=1000, chains=4, cores=4, seed=9,
  file=here("files/models/m71.1")
)
m1
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: con ~ 0 + a 
         a ~ 0 + id.f
   Data: d (Number of observations: 219) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
a_id.f6       0.19      0.03     0.14     0.25 1.00     5625     3051
a_id.f29      0.12      0.03     0.06     0.19 1.00     6068     2824
a_id.f34      0.11      0.03     0.04     0.17 1.00     5718     3035
a_id.f36      0.16      0.03     0.09     0.23 1.00     6119     2895
a_id.f37      0.19      0.03     0.13     0.24 1.00     5445     3015
a_id.f48      0.22      0.03     0.16     0.27 1.00     5453     3081
a_id.f53      0.19      0.03     0.12     0.26 1.00     5551     2922
a_id.f54      0.25      0.03     0.19     0.30 1.00     5960     3024
a_id.f58      0.23      0.03     0.18     0.29 1.00     6524     2736
a_id.f61      0.09      0.03     0.02     0.16 1.00     5749     2323
a_id.f66      0.24      0.03     0.18     0.29 1.00     6296     2809
a_id.f67      0.20      0.02     0.15     0.25 1.00     6392     2847
a_id.f69      0.23      0.05     0.14     0.33 1.00     6454     3110
a_id.f71      0.06      0.03     0.01     0.11 1.00     5961     2871
a_id.f74      0.19      0.05     0.09     0.28 1.00     5496     2875
a_id.f75      0.27      0.03     0.20     0.34 1.00     6453     2771
a_id.f76      0.10      0.03     0.05     0.15 1.00     5279     2847
a_id.f78      0.24      0.03     0.19     0.30 1.00     6529     2775
a_id.f79      0.13      0.03     0.07     0.20 1.00     5353     3025
a_id.f80      0.14      0.03     0.07     0.20 1.00     4992     3154
a_id.f81      0.24      0.03     0.18     0.30 1.00     5350     2923
a_id.f82      0.33      0.02     0.28     0.37 1.00     5953     3181
a_id.f85      0.16      0.03     0.09     0.22 1.00     5285     2947
a_id.f86      0.12      0.03     0.05     0.19 1.00     5730     2870
a_id.f87      0.11      0.03     0.04     0.18 1.00     6111     3085
a_id.f89      0.14      0.03     0.08     0.19 1.00     6114     2961
a_id.f91      0.17      0.03     0.12     0.23 1.00     6302     3053
a_id.f92      0.20      0.03     0.15     0.26 1.00     5362     2901
a_id.f93      0.25      0.03     0.20     0.31 1.00     5458     3163
a_id.f94      0.12      0.03     0.06     0.18 1.00     5703     2502
a_id.f96      0.23      0.03     0.17     0.30 1.00     5478     3106
a_id.f97      0.25      0.02     0.20     0.29 1.00     5962     3126
a_id.f98      0.14      0.02     0.09     0.19 1.00     5907     2922
a_id.f99      0.18      0.02     0.13     0.23 1.00     5436     3042
a_id.f101     0.24      0.03     0.19     0.30 1.00     5774     2770
a_id.f102     0.23      0.05     0.13     0.33 1.00     5546     2841
a_id.f103     0.15      0.03     0.08     0.22 1.00     6545     2763
a_id.f104     0.17      0.03     0.11     0.24 1.00     5815     2877
a_id.f105     0.16      0.03     0.11     0.22 1.00     6403     2823
a_id.f106     0.20      0.03     0.14     0.25 1.00     5788     2810
a_id.f110     0.18      0.03     0.13     0.24 1.00     5848     2835
a_id.f112     0.09      0.03     0.04     0.15 1.00     6419     2807
a_id.f114     0.13      0.03     0.08     0.19 1.00     6196     3212
a_id.f115     0.14      0.03     0.09     0.20 1.00     5733     2479
a_id.f116     0.14      0.03     0.09     0.20 1.00     5518     2592
a_id.f120     0.35      0.03     0.29     0.42 1.00     6731     2725
a_id.f122     0.17      0.03     0.11     0.23 1.00     6214     2940
a_id.f125     0.20      0.03     0.14     0.25 1.00     5916     3061
a_id.f127     0.23      0.03     0.17     0.28 1.00     5723     3174
a_id.f129     0.13      0.03     0.06     0.20 1.00     5144     2627
a_id.f135     0.30      0.03     0.24     0.35 1.00     4810     2702
a_id.f136     0.29      0.03     0.24     0.35 1.00     5689     3004
a_id.f137     0.29      0.03     0.23     0.36 1.00     5267     3053
a_id.f140     0.13      0.03     0.06     0.19 1.00     5082     2662
a_id.f141     0.19      0.03     0.12     0.26 1.00     6459     2535
a_id.f142     0.18      0.03     0.11     0.24 1.00     5489     3121
a_id.f143     0.31      0.03     0.24     0.38 1.00     4659     2904
a_id.f144     0.17      0.03     0.11     0.24 1.00     5522     2935
a_id.f146     0.21      0.03     0.14     0.27 1.00     5840     3031
a_id.f149     0.23      0.03     0.16     0.29 1.00     5598     2872
a_id.f150     0.20      0.04     0.13     0.27 1.00     5041     3086
a_id.f152     0.22      0.03     0.16     0.29 1.00     5956     2740
a_id.f153     0.15      0.03     0.08     0.22 1.00     5164     2357
a_id.f155     0.14      0.03     0.08     0.20 1.00     6107     3095
a_id.f156     0.14      0.03     0.08     0.19 1.00     4763     3080
a_id.f159     0.12      0.03     0.06     0.19 1.00     6144     2853
a_id.f160     0.18      0.03     0.11     0.24 1.00     5525     3077
a_id.f162     0.39      0.03     0.34     0.45 1.00     5761     2859
a_id.f163     0.13      0.03     0.07     0.20 1.00     5954     2910
a_id.f165     0.20      0.03     0.13     0.26 1.00     5551     2634
a_id.f167     0.13      0.03     0.06     0.20 1.00     6029     2934
a_id.f169     0.24      0.03     0.18     0.31 1.00     5742     2835
a_id.f171     0.23      0.04     0.16     0.30 1.00     6125     2801
a_id.f174     0.26      0.03     0.20     0.33 1.00     5144     2945
a_id.f182     0.19      0.03     0.12     0.26 1.00     5523     2574
a_id.f187     0.06      0.03    -0.00     0.13 1.00     5601     3169
a_id.f189     0.22      0.03     0.16     0.29 1.00     6924     2991
a_id.f190     0.13      0.03     0.06     0.19 1.00     5857     3073
a_id.f193     0.21      0.03     0.14     0.28 1.00     5658     2451
a_id.f194     0.12      0.03     0.05     0.19 1.00     5872     3228
a_id.f201     0.19      0.03     0.12     0.25 1.00     6312     2938
a_id.f204     0.29      0.03     0.22     0.35 1.00     6373     3108
a_id.f205     0.19      0.03     0.13     0.26 1.00     6528     3011
a_id.f208     0.20      0.03     0.13     0.27 1.00     6211     2455
a_id.f209     0.21      0.03     0.15     0.28 1.00     6647     2275
a_id.f211     0.14      0.03     0.07     0.20 1.00     5620     2983
a_id.f214     0.28      0.03     0.21     0.35 1.00     5602     2885
a_id.f219     0.23      0.03     0.16     0.30 1.00     6497     2666
a_id.f222     0.14      0.03     0.07     0.20 1.00     6280     3085
a_id.f223     0.17      0.03     0.10     0.23 1.00     5317     3186
a_id.f229     0.14      0.03     0.07     0.21 1.00     5391     2633

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.05      0.00     0.04     0.05 1.00     1869     2510

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

This is inefficient, in that the model treat each person as entirely separate. Let’s try a partial pooling model.

\[\begin{align*} \text{con}_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha_{\text{id}[i]} \\ \alpha_j &\sim \text{Normal}(\bar{\alpha}, \sigma_{\alpha}) \text{ for j in 1...91}\\ \bar{\alpha} &\sim \text{Normal}(0, 1.5)\\ \sigma_{\alpha} &\sim \text{Exponential}(1) \\ \sigma \sim &\text{Exponential}(1) \\ \end{align*}\]

m2 <- brm(
  data=d,
  family=gaussian,
  con ~ 1 + (1 | id), 
  prior = c( prior(normal(0, 1.5), class=Intercept),
             prior(exponential(1), class=sd),
             prior(exponential(1), class=sigma)),
  iter=2000, warmup=1000, chains=4, cores=4, seed=9,
  file=here("files/models/m71.2")
)
m2
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: con ~ 1 + (1 | id) 
   Data: d (Number of observations: 219) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~id (Number of levels: 91) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.06      0.01     0.05     0.07 1.00     1515     2201

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.19      0.01     0.18     0.20 1.00     2210     2482

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.05      0.00     0.04     0.05 1.00     2942     2835

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

how many parameters does each model have?

get_variables(m1) 
  [1] "b_a_id.f6"     "b_a_id.f29"    "b_a_id.f34"    "b_a_id.f36"   
  [5] "b_a_id.f37"    "b_a_id.f48"    "b_a_id.f53"    "b_a_id.f54"   
  [9] "b_a_id.f58"    "b_a_id.f61"    "b_a_id.f66"    "b_a_id.f67"   
 [13] "b_a_id.f69"    "b_a_id.f71"    "b_a_id.f74"    "b_a_id.f75"   
 [17] "b_a_id.f76"    "b_a_id.f78"    "b_a_id.f79"    "b_a_id.f80"   
 [21] "b_a_id.f81"    "b_a_id.f82"    "b_a_id.f85"    "b_a_id.f86"   
 [25] "b_a_id.f87"    "b_a_id.f89"    "b_a_id.f91"    "b_a_id.f92"   
 [29] "b_a_id.f93"    "b_a_id.f94"    "b_a_id.f96"    "b_a_id.f97"   
 [33] "b_a_id.f98"    "b_a_id.f99"    "b_a_id.f101"   "b_a_id.f102"  
 [37] "b_a_id.f103"   "b_a_id.f104"   "b_a_id.f105"   "b_a_id.f106"  
 [41] "b_a_id.f110"   "b_a_id.f112"   "b_a_id.f114"   "b_a_id.f115"  
 [45] "b_a_id.f116"   "b_a_id.f120"   "b_a_id.f122"   "b_a_id.f125"  
 [49] "b_a_id.f127"   "b_a_id.f129"   "b_a_id.f135"   "b_a_id.f136"  
 [53] "b_a_id.f137"   "b_a_id.f140"   "b_a_id.f141"   "b_a_id.f142"  
 [57] "b_a_id.f143"   "b_a_id.f144"   "b_a_id.f146"   "b_a_id.f149"  
 [61] "b_a_id.f150"   "b_a_id.f152"   "b_a_id.f153"   "b_a_id.f155"  
 [65] "b_a_id.f156"   "b_a_id.f159"   "b_a_id.f160"   "b_a_id.f162"  
 [69] "b_a_id.f163"   "b_a_id.f165"   "b_a_id.f167"   "b_a_id.f169"  
 [73] "b_a_id.f171"   "b_a_id.f174"   "b_a_id.f182"   "b_a_id.f187"  
 [77] "b_a_id.f189"   "b_a_id.f190"   "b_a_id.f193"   "b_a_id.f194"  
 [81] "b_a_id.f201"   "b_a_id.f204"   "b_a_id.f205"   "b_a_id.f208"  
 [85] "b_a_id.f209"   "b_a_id.f211"   "b_a_id.f214"   "b_a_id.f219"  
 [89] "b_a_id.f222"   "b_a_id.f223"   "b_a_id.f229"   "sigma"        
 [93] "lprior"        "lp__"          "accept_stat__" "stepsize__"   
 [97] "treedepth__"   "n_leapfrog__"  "divergent__"   "energy__"     

how many parameters does each model have?

get_variables(m1) %>% length()
[1] 100
get_variables(m2) %>% length()
[1] 103

What additional parameters?

m1 has a unique intercept for each participant and a standard deviation of scores (1 \(\sigma\)).

m2 is estimating all of that plus a grand mean intercept and the variability of means (\(\sigma_M\)).

(what’s the extra one? brms lists the intercept twice. *shrug emoji*)

And yet!

m1 <- add_criterion(m1, criterion = "loo")
m2 <- add_criterion(m2, criterion = "loo")

loo_compare(m1, m2) %>% print(simplify=F)
   elpd_diff se_diff elpd_loo se_elpd_loo p_loo  se_p_loo looic  se_looic
m2    0.0       0.0   314.5     11.0        63.9    5.1   -628.9   22.0  
m1  -17.1       5.9   297.4      9.2        83.6    5.2   -594.7   18.5  

Let’s visualize the differences in these.

Code
nd1 = distinct(d, id.f)
post1 = epred_draws(m1, nd1)
nd2 = distinct(d, id)
post2 = epred_draws(m2, nd2)
p1 = post1 %>% 
  ggplot( aes(y=.epred, x=id.f) ) +
  stat_gradientinterval() +
  scale_x_discrete(labels=NULL, breaks=NULL) +
  labs(x="id", y="con", title = "no pooling")

p2 = post2 %>% 
  mutate(id=as.factor(id)) %>% 
  ggplot( aes(y=.epred, x=id) ) +
  stat_gradientinterval() +
  scale_x_discrete(labels=NULL, breaks=NULL) +
  labs(x="id", y="con", title = "partial pooling")

p1 / p2
Code
means1 = post1 %>% 
  mean_qi(.epred)
means2 = post2 %>% 
  mean_qi(.epred) %>% 
  mutate(id=as.factor(id))

means1 %>% 
  ggplot( aes(x=id.f, y=.epred)) +
  geom_hline( aes(yintercept=mean(.epred)),
              linetype="dashed") +
  geom_point( aes(color="no pooling") ) +
  geom_point( aes(x=id, color="partial pooling"),
              data=means2,
              size=2) +
  scale_color_manual( values=c("#e07a5f", "#1c5253") ) +
  scale_x_discrete(breaks=NULL) +
  labs(x="id", y="con")+
  theme(legend.position = "top")

\[\begin{align*} \text{con}_{ij} &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha_{id[i]} \\ \alpha_{id[i]} &\sim \text{Normal}(\bar{\alpha}, 1) \text{ for i in } 1...91\\ \bar{\alpha} &\sim \text{Normal}(.50,.25) \\ \sigma &\sim \text{Exponential}(1) \\ \end{align*}\]

Another way to write out the expected value of \(\mu_j\) is:

\[ \mu_j = \bar{\alpha} + \alpha_{i[ID]} \]

Where \(\bar{\alpha}\) is the mean of the means and \(\alpha_{i[ID]}\) is the deviation of each person from the grand mean. There will be one \(\bar{\alpha}\) for the entire sample, but a different \(\alpha_{i[ID]}\) for each person. Because there are multiple values for each person, we can calculate the variability of those \(\alpha\)’s, as well as the residual variability within each person.

\[\begin{align*} \text{con}_{ij} &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \bar{\alpha} + \alpha_{id[i]} \\ \bar{\alpha} &\sim \text{Normal}(.50,.25) \\ \sigma &\sim \text{Exponential}(1) \\ \sigma_{\alpha} &\sim \text{Exponential}(1) \\ \end{align*}\]

In this reparameterization, we no longer need a prior for each of the person-means. That’s because they must have a mean of 0. Instead, we only need to estimate the variability \((\sigma_{\alpha})\) of these deviations.

In other words, this model is analogous to an ANOVA, in which we calculate both within- and between-group (aka person) variability. Therefore, we’ll have a standard deviation for both levels.

magical tidybayes

spread_draws(m2, b_Intercept, r_id[id, term])
# A tibble: 364,000 × 7
# Groups:   id, term [91]
   .chain .iteration .draw b_Intercept    id term          r_id
    <int>      <int> <int>       <dbl> <int> <chr>        <dbl>
 1      1          1     1       0.184     6 Intercept  0.0319 
 2      1          1     1       0.184    29 Intercept -0.0178 
 3      1          1     1       0.184    34 Intercept -0.0763 
 4      1          1     1       0.184    36 Intercept -0.0297 
 5      1          1     1       0.184    37 Intercept  0.00844
 6      1          1     1       0.184    48 Intercept  0.0446 
 7      1          1     1       0.184    53 Intercept  0.0224 
 8      1          1     1       0.184    54 Intercept  0.0957 
 9      1          1     1       0.184    58 Intercept  0.00863
10      1          1     1       0.184    61 Intercept -0.00858
# ℹ 363,990 more rows
Code
spread_draws(m2, b_Intercept, r_id[id, term]) %>% 
  mutate(p_mean = b_Intercept + r_id) %>% 
  filter(id %in% sample_id) %>% 
  ggplot(aes(x = id, y = p_mean)) + 
  geom_hline( aes(yintercept = mean(d$con)),
              linetype="dashed") +
  stat_halfeye(alpha =.5) 

In this figure, we can visualize three approaches to pooling:

  • NO POOLING: each person’s mean is calculated from only their own data, there is no sharing of information. (black dots)
  • COMPLETE POOLING: any differences between people are just random noise. The mean of the whole group is the only estimate and is assumed to apply equally to each person (dashed line).
  • PARTIAL POOLING: information is shared across participants, but they are assumed to be different. We regularize estimates of person-specific means towards the grand mean. The more information we have on a person, the less their estimate is regularized.
Code
actual_means = d %>% 
  with_groups(id, summarise, p_mean = mean(con)) %>% 
  mutate()

spread_draws(m2, b_Intercept, r_id[id, term]) %>% 
  mutate(p_mean = b_Intercept + r_id) %>% 
  mean_qi(p_mean) %>% 
  filter(id %in% sample_id) %>% 
  ggplot( aes(x=id, y=p_mean) ) +
  geom_hline( aes(yintercept = mean(d$con)),
              linetype="dashed") +
  geom_point( size=2, color="#e07a5f") + 
  geom_point( data=filter(actual_means, 
                          id %in% sample_id)) +
  labs(x="id",y="con")

writing our mulitilevel model

It’s common to write out mulitlevel models using formulas for the different levels. Level 1 is the level of your outcome, or the thing that repeats. Level 2 is the level of your groups.

\[\begin{align*} \text{Level 1} &\\ \text{con}_{ij} &= \alpha_i + \epsilon_{ij} \\ \text{Level 2} &\\ \alpha_i &= \gamma_0 + U_i \end{align*}\]

Some refer to \(U_j\) as a “random” or “varying” effect because it varies across groups. \(\gamma_0\) is therefore a “fixed” or “non-varying” effect because it applies to all groups.

drawing from the posterior

We’ve already seen how we can use spread_draws to get our parameter estimates.

m2 %>% spread_draws(b_Intercept, sigma, sd_id__Intercept) %>% head
# A tibble: 6 × 6
  .chain .iteration .draw b_Intercept  sigma sd_id__Intercept
   <int>      <int> <int>       <dbl>  <dbl>            <dbl>
1      1          1     1       0.184 0.0489           0.0641
2      1          2     2       0.180 0.0454           0.0633
3      1          3     3       0.180 0.0429           0.0708
4      1          4     4       0.177 0.0474           0.0666
5      1          5     5       0.184 0.0471           0.0617
6      1          6     6       0.191 0.0471           0.0548
m2 %>% spread_draws(r_id[id, term]) %>% head
# A tibble: 6 × 6
# Groups:   id, term [1]
     id term          r_id .chain .iteration .draw
  <int> <chr>        <dbl>  <int>      <int> <int>
1     6 Intercept  0.0319       1          1     1
2     6 Intercept -0.00463      1          2     2
3     6 Intercept -0.0217       1          3     3
4     6 Intercept -0.0230       1          4     4
5     6 Intercept  0.0167       1          5     5
6     6 Intercept -0.0143       1          6     6

We can get expected values (means) for individuals in our sample.

set.seed(9)
sample_id = sample(unique(d$id), replace=F, size=3)

nd = data.frame(id=sample_id, id.f = sample_id)

add_epred_draws(newdata=nd, object=m2) %>% 
  mean_qi()
# A tibble: 3 × 9
     id  id.f  .row .epred .lower .upper .width .point .interval
  <int> <int> <int>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1    48    48     3  0.213  0.162  0.264   0.95 mean   qi       
2   137   137     2  0.266  0.208  0.323   0.95 mean   qi       
3   146   146     1  0.205  0.148  0.262   0.95 mean   qi       

We can get predicted values (scores) for individuals in our sample.

add_predicted_draws(newdata=nd, object=m2) %>% 
  mean_qi()
# A tibble: 3 × 9
     id  id.f  .row .prediction .lower .upper .width .point .interval
  <int> <int> <int>       <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1    48    48     3       0.212 0.108   0.316   0.95 mean   qi       
2   137   137     2       0.266 0.156   0.375   0.95 mean   qi       
3   146   146     1       0.203 0.0935  0.311   0.95 mean   qi       

expected values vs predicted values

Code
expected = add_epred_draws(newdata=nd, object=m2)
predicted = add_predicted_draws(newdata=nd, object=m2)

full_join(expected, predicted) %>% 
  filter(.draw <= 100) %>% 
  ungroup() %>% 
  ggplot( aes(x=id, y=.epred)) +
  stat_halfeye( aes(fill="expected"),
                alpha=.3) +
  stat_halfeye( aes(y =.prediction, fill="predicted"), 
                alpha=.3 ) +
  scale_fill_manual( values=c("#e07a5f","#1c5253") ) +
  labs(x="id", y="con", fill="draw") +
  scale_x_continuous(breaks=sample_id) +
  theme(legend.position = "top")

Finally, we can get expected values for new individuals

Code
new_id = max(d$id) + seq(1:3)

nd = data.frame(id=new_id)

add_epred_draws(newdata=nd, object=m2, 
                allow_new_levels=T) %>% 
  ggplot( aes(x=id, y=.epred) ) +
  stat_halfeye()

adding covariates

Let’s add time to our model. Because time varies (can change) within person from assessment to assessment, this is a Level 1 variable. Note that I have NOT added a varying component to Level 2 – in other words, I’m stating that the effect of time on conscientiousness is fixed or identical across participants.

\[\begin{align*} \text{Level 1} &\\ \text{con}_{ij} &= \alpha_i + \beta_i(\text{week}_{ij}) + \epsilon_{ij} \\ \text{Level 2} &\\ \alpha_i &= \gamma_0 + U_{0i} \\ \beta_i &= \gamma_1 \\ \end{align*}\]

m3 <- brm(
  data=d,
  family=gaussian,
  con ~ 1 + week + (1 | id), 
  prior = c( prior(normal(.50, .25), class=Intercept),
             prior(normal(0, 1), class=b), #new prior for new term
             prior(exponential(1), class=sd),
             prior(exponential(1), class=sigma)),
  iter=2000, warmup=1000, chains=4, cores=4, seed=9,
  file=here("files/models/m71.3")
)
m3
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: con ~ 1 + week + (1 | id) 
   Data: d (Number of observations: 216) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~id (Number of levels: 88) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.06      0.01     0.05     0.07 1.01     1237     2024

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.19      0.01     0.18     0.21 1.00     2132     2576
week         -0.00      0.00    -0.01     0.00 1.00     5783     3151

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.05      0.00     0.04     0.05 1.00     2931     2859

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

There are different intercepts for each participant, but not different slopes.

get_variables(m3)
  [1] "b_Intercept"         "b_week"              "sd_id__Intercept"   
  [4] "sigma"               "Intercept"           "r_id[6,Intercept]"  
  [7] "r_id[29,Intercept]"  "r_id[34,Intercept]"  "r_id[36,Intercept]" 
 [10] "r_id[37,Intercept]"  "r_id[48,Intercept]"  "r_id[53,Intercept]" 
 [13] "r_id[54,Intercept]"  "r_id[58,Intercept]"  "r_id[61,Intercept]" 
 [16] "r_id[66,Intercept]"  "r_id[67,Intercept]"  "r_id[71,Intercept]" 
 [19] "r_id[75,Intercept]"  "r_id[76,Intercept]"  "r_id[78,Intercept]" 
 [22] "r_id[79,Intercept]"  "r_id[80,Intercept]"  "r_id[81,Intercept]" 
 [25] "r_id[82,Intercept]"  "r_id[85,Intercept]"  "r_id[86,Intercept]" 
 [28] "r_id[87,Intercept]"  "r_id[89,Intercept]"  "r_id[91,Intercept]" 
 [31] "r_id[92,Intercept]"  "r_id[93,Intercept]"  "r_id[94,Intercept]" 
 [34] "r_id[96,Intercept]"  "r_id[97,Intercept]"  "r_id[98,Intercept]" 
 [37] "r_id[99,Intercept]"  "r_id[101,Intercept]" "r_id[103,Intercept]"
 [40] "r_id[104,Intercept]" "r_id[105,Intercept]" "r_id[106,Intercept]"
 [43] "r_id[110,Intercept]" "r_id[112,Intercept]" "r_id[114,Intercept]"
 [46] "r_id[115,Intercept]" "r_id[116,Intercept]" "r_id[120,Intercept]"
 [49] "r_id[122,Intercept]" "r_id[125,Intercept]" "r_id[127,Intercept]"
 [52] "r_id[129,Intercept]" "r_id[135,Intercept]" "r_id[136,Intercept]"
 [55] "r_id[137,Intercept]" "r_id[140,Intercept]" "r_id[141,Intercept]"
 [58] "r_id[142,Intercept]" "r_id[143,Intercept]" "r_id[144,Intercept]"
 [61] "r_id[146,Intercept]" "r_id[149,Intercept]" "r_id[150,Intercept]"
 [64] "r_id[152,Intercept]" "r_id[153,Intercept]" "r_id[155,Intercept]"
 [67] "r_id[156,Intercept]" "r_id[159,Intercept]" "r_id[160,Intercept]"
 [70] "r_id[162,Intercept]" "r_id[163,Intercept]" "r_id[165,Intercept]"
 [73] "r_id[167,Intercept]" "r_id[169,Intercept]" "r_id[171,Intercept]"
 [76] "r_id[174,Intercept]" "r_id[182,Intercept]" "r_id[187,Intercept]"
 [79] "r_id[189,Intercept]" "r_id[190,Intercept]" "r_id[193,Intercept]"
 [82] "r_id[194,Intercept]" "r_id[201,Intercept]" "r_id[204,Intercept]"
 [85] "r_id[205,Intercept]" "r_id[208,Intercept]" "r_id[209,Intercept]"
 [88] "r_id[211,Intercept]" "r_id[214,Intercept]" "r_id[219,Intercept]"
 [91] "r_id[222,Intercept]" "r_id[223,Intercept]" "r_id[229,Intercept]"
 [94] "lprior"              "lp__"                "accept_stat__"      
 [97] "stepsize__"          "treedepth__"         "n_leapfrog__"       
[100] "divergent__"         "energy__"           
Code
set.seed(9)
sample_id = sample(unique(d$id), replace=F, size = 20)
distinct(d, id, week) %>% 
  filter(id %in% sample_id) %>% 
  add_epred_draws(m3) %>% 
  mean_qi() %>% 
  ggplot( aes(x=week, y=.epred, color=as.factor(id))) +
  geom_point() +
  geom_line() +
  guides(color=F) +
  labs(x="week",y="con")

level 2 covariates

We can also add covariates at Level 2, or the person-level in this case. We have a binary variable called group that refers to each person.

\[\begin{align*} \text{Level 1} &\\ \text{con}_{ij} &= \alpha_i + \beta_i(\text{week}_{ij}) + \epsilon_{ij} \\ \text{Level 2} &\\ \alpha_i &= \gamma_{0\text{group}[i]} + U_{0i} \\ \beta_i &= \gamma_{1} \\ \end{align*}\]

m4 <- brm(
  data=d,
  family=gaussian,
    con ~ 0 + week + group + (1 | id), 
  prior = c( prior(normal(0, 1), class=b), # this prior still works!
             prior(exponential(1), class=sd),
             prior(exponential(1), class=sigma)),
  iter=5000, warmup=1000, chains=4, cores=4, seed=9,
  file=here("files/models/m71.4")
)
m4
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: con ~ 0 + week + group + (1 | id) 
   Data: d (Number of observations: 216) 
  Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
         total post-warmup draws = 16000

Multilevel Hyperparameters:
~id (Number of levels: 88) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.06      0.01     0.05     0.07 1.00     5193     9057

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
week         -0.00      0.00    -0.01     0.00 1.00    26657    13146
groupCTRL     0.20      0.01     0.18     0.23 1.00     7225     9612
groupPD       0.19      0.01     0.17     0.20 1.00     6794     9456

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.05      0.00     0.04     0.05 1.00    11021    11211

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
get_variables(m4)
  [1] "b_week"              "b_groupCTRL"         "b_groupPD"          
  [4] "sd_id__Intercept"    "sigma"               "r_id[6,Intercept]"  
  [7] "r_id[29,Intercept]"  "r_id[34,Intercept]"  "r_id[36,Intercept]" 
 [10] "r_id[37,Intercept]"  "r_id[48,Intercept]"  "r_id[53,Intercept]" 
 [13] "r_id[54,Intercept]"  "r_id[58,Intercept]"  "r_id[61,Intercept]" 
 [16] "r_id[66,Intercept]"  "r_id[67,Intercept]"  "r_id[71,Intercept]" 
 [19] "r_id[75,Intercept]"  "r_id[76,Intercept]"  "r_id[78,Intercept]" 
 [22] "r_id[79,Intercept]"  "r_id[80,Intercept]"  "r_id[81,Intercept]" 
 [25] "r_id[82,Intercept]"  "r_id[85,Intercept]"  "r_id[86,Intercept]" 
 [28] "r_id[87,Intercept]"  "r_id[89,Intercept]"  "r_id[91,Intercept]" 
 [31] "r_id[92,Intercept]"  "r_id[93,Intercept]"  "r_id[94,Intercept]" 
 [34] "r_id[96,Intercept]"  "r_id[97,Intercept]"  "r_id[98,Intercept]" 
 [37] "r_id[99,Intercept]"  "r_id[101,Intercept]" "r_id[103,Intercept]"
 [40] "r_id[104,Intercept]" "r_id[105,Intercept]" "r_id[106,Intercept]"
 [43] "r_id[110,Intercept]" "r_id[112,Intercept]" "r_id[114,Intercept]"
 [46] "r_id[115,Intercept]" "r_id[116,Intercept]" "r_id[120,Intercept]"
 [49] "r_id[122,Intercept]" "r_id[125,Intercept]" "r_id[127,Intercept]"
 [52] "r_id[129,Intercept]" "r_id[135,Intercept]" "r_id[136,Intercept]"
 [55] "r_id[137,Intercept]" "r_id[140,Intercept]" "r_id[141,Intercept]"
 [58] "r_id[142,Intercept]" "r_id[143,Intercept]" "r_id[144,Intercept]"
 [61] "r_id[146,Intercept]" "r_id[149,Intercept]" "r_id[150,Intercept]"
 [64] "r_id[152,Intercept]" "r_id[153,Intercept]" "r_id[155,Intercept]"
 [67] "r_id[156,Intercept]" "r_id[159,Intercept]" "r_id[160,Intercept]"
 [70] "r_id[162,Intercept]" "r_id[163,Intercept]" "r_id[165,Intercept]"
 [73] "r_id[167,Intercept]" "r_id[169,Intercept]" "r_id[171,Intercept]"
 [76] "r_id[174,Intercept]" "r_id[182,Intercept]" "r_id[187,Intercept]"
 [79] "r_id[189,Intercept]" "r_id[190,Intercept]" "r_id[193,Intercept]"
 [82] "r_id[194,Intercept]" "r_id[201,Intercept]" "r_id[204,Intercept]"
 [85] "r_id[205,Intercept]" "r_id[208,Intercept]" "r_id[209,Intercept]"
 [88] "r_id[211,Intercept]" "r_id[214,Intercept]" "r_id[219,Intercept]"
 [91] "r_id[222,Intercept]" "r_id[223,Intercept]" "r_id[229,Intercept]"
 [94] "lprior"              "lp__"                "accept_stat__"      
 [97] "stepsize__"          "treedepth__"         "n_leapfrog__"       
[100] "divergent__"         "energy__"           
Code
set.seed(9)
sample_id = sample(unique(d$id), replace=F, size = 20)
distinct(d, id, group, week) %>% 
  filter(id %in% sample_id) %>% 
  add_epred_draws(m4) %>% 
  mean_qi() %>% 
  ggplot( aes(x=week, y=.epred, group = as.factor(id), color=group)) +
  geom_point() +
  geom_line() +
  scale_color_manual(values=c("#1c5253" , "#e07a5f")) +
  labs(x="week",y="con") +
  theme(legend.position = "top")

So far, we’ve only added covariates to the estimates of the intercept. But we can have covariates on our slope parameters. These are interactions (the effect of variable X on Y changes as a function of variable Z.) And they introduce additional parameters into our model.

\[\begin{align*} \text{Level 1} &\\ \text{con}_{ij} &= \alpha_i + \beta_i(\text{week}_{ij}) + \epsilon_{ij} \\ \text{Level 2} &\\ \alpha_i &= \gamma_{0} + U_{0i} \\ \beta_i &= \gamma_{1} + U_{1i} \\ \end{align*}\]

m5 <- brm(
  data=d,
  family=gaussian,
    con ~ 1 + week + (1 + week | id), 
  prior = c( prior(normal(0, 1), class=Intercept),
             prior(normal(0, 1), class=b), 
             prior(exponential(1), class=sd),
             prior(exponential(1), class=sigma)),
  iter=5000, warmup=1000, chains=4, cores=4, seed=9,
  file=here("files/models/m71.5")
)

Note the hyperparameter, cor(Intercept, week)

m5
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: con ~ 1 + week + (1 + week | id) 
   Data: d (Number of observations: 216) 
  Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
         total post-warmup draws = 16000

Multilevel Hyperparameters:
~id (Number of levels: 88) 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)           0.06      0.01     0.05     0.07 1.00     3048     3184
sd(week)                0.00      0.00     0.00     0.01 1.00     1416      973
cor(Intercept,week)    -0.07      0.48    -0.89     0.90 1.00     8793     8255

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.19      0.01     0.18     0.21 1.00     3511     5895
week         -0.00      0.00    -0.01     0.00 1.00    10237     7760

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.05      0.00     0.04     0.05 1.00     1856     1084

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).